In [1]:
import graphlab

In [2]:
image_train = graphlab.SFrame('image_train_data/')


This non-commercial license of GraphLab Create for academic use is assigned to y_xwang@163.com and will expire on March 13, 2018.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1489772089.log

In [3]:
image_test = graphlab.SFrame('image_test_data/')

In [4]:
graphlab.canvas.set_target('ipynb')

In [5]:
image_train['image'].show()


train a classifier on the raw picture pixels


In [8]:
raw_pixel_model = graphlab.logistic_classifier.create(image_train, target='label', features=['image_array'])


PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
Logistic regression:
--------------------------------------------------------
Number of examples          : 1893
Number of classes           : 4
Number of feature columns   : 1
Number of unpacked features : 3072
Number of coefficients    : 9219
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1         | 6        | 0.000014  | 5.715820     | 0.306920          | 0.276786            |
| 2         | 8        | 1.000000  | 8.185219     | 0.406233          | 0.339286            |
| 3         | 9        | 1.000000  | 9.613076     | 0.421553          | 0.401786            |
| 4         | 10       | 1.000000  | 11.054109    | 0.446381          | 0.392857            |
| 5         | 11       | 1.000000  | 12.598552    | 0.449023          | 0.401786            |
| 6         | 12       | 1.000000  | 13.880117    | 0.474379          | 0.428571            |
| 7         | 13       | 1.000000  | 15.109470    | 0.483360          | 0.446429            |
| 8         | 14       | 1.000000  | 16.260115    | 0.483888          | 0.437500            |
| 9         | 15       | 1.000000  | 17.701709    | 0.486529          | 0.446429            |
| 10        | 16       | 1.000000  | 19.146483    | 0.487058          | 0.446429            |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.

predict


In [12]:
image_test[0:3]['image'].show()



In [13]:
image_test[0:3]['label']


Out[13]:
dtype: str
Rows: 3
['cat', 'automobile', 'cat']

In [14]:
raw_pixel_model.predict(image_test[0:3])


Out[14]:
dtype: str
Rows: 3
['bird', 'cat', 'bird']

evaluate all model


In [15]:
raw_pixel_model.evaluate(image_test)


Out[15]:
{'accuracy': 0.454, 'auc': 0.7109694166666688, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     bird     |       cat       |  150  |
 |     bird     |    automobile   |   80  |
 |  automobile  |    automobile   |  549  |
 |     dog      |       dog       |   94  |
 |     cat      |       dog       |   47  |
 |     dog      |       bird      |  499  |
 |     cat      |    automobile   |  111  |
 |     cat      |       cat       |  425  |
 |     bird     |       bird      |  748  |
 |  automobile  |       bird      |  258  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.4212807450491278, 'log_loss': 1.2722231665636015, 'precision': 0.5007986607806926, 'recall': 0.454, 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 	class	int
 
 Rows: 400004
 
 Data:
 +-----------+-----+-----+------+------+-------+
 | threshold | fpr | tpr |  p   |  n   | class |
 +-----------+-----+-----+------+------+-------+
 |    0.0    | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   1e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   2e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   3e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   4e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   5e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   6e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   7e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   8e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   9e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 +-----------+-----+-----+------+------+-------+
 [400004 rows x 6 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

can we use deep features to improve the model


In [16]:
len(image_train)


Out[16]:
2005

In [18]:
deep_learning_model = graphlab.load_model('imagenet_model')


---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-18-b9630e5963fc> in <module>()
----> 1 deep_learning_model = graphlab.load_model('imagenet_model')

/home/quoniam/anaconda2/envs/gl-env/lib/python2.7/site-packages/graphlab/toolkits/_model.pyc in load_model(location)
     80     if not dir_archive_exists:
     81         # Not a ToolkitError so try unpickling the model.
---> 82         unpickler = gl_pickle.GLUnpickler(location)
     83 
     84         # Get the version

/home/quoniam/anaconda2/envs/gl-env/lib/python2.7/site-packages/graphlab/_gl_pickle.pyc in __init__(self, filename)
    464                            _os.path.expandvars(filename)))
    465             if not _os.path.exists(filename):
--> 466                 raise IOError('%s is not a valid file name.' % filename)
    467 
    468         # GLC 1.3 Pickle file

IOError: /home/quoniam/version-control/happy-machine-learning/was_ML_coursera/week6/imagenet_model is not a valid file name.

In [ ]: